TopSort: A High-Performance Two-Phase Sorting Accelerator Optimized on HBM-Based FPGAs

نویسندگان

چکیده

The emergence of high-bandwidth memory (HBM) brings new opportunities to boost the performance sorting acceleration on FPGAs, which was conventionally bounded by available off-chip bandwidth. However, it is nontrivial for designers fully utilize this immense First, existing sorter designs cannot be directly scaled at increasing rate bandwidth, as required on-chip resource usage grows a much faster and would bound in turn. Second, need an in-depth understanding HBM's characteristics effectively HBM To tackle these challenges, we present TopSort, novel two-phase solution optimized HBM-based FPGAs. In first phase, 16 merge trees work parallel 32 channels' second TopSort reuses logic from phase one form wider tree partially sorted results one. also adopts HBM-specific optimizations reduce overhead improve bandwidth utilization. can sort up 4 GB data using all channels, with overall 15.6 GB/s. 6.7× 2.7× than state-of-the-art CPU FPGA sorters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism

Sorting is an extremely important computation kernel that has been accelerated in a lot of fields such as databases, image processing, and genome analysis. Given that advent of Internet of Things (IoT) era due to mobile technology progressions, the future needs a sorting method that is available on any environment, such as not only high performance systems like servers but also low computationa...

متن کامل

High Performance Monte-Carlo Based Option Pricing on FPGAs

High performance computing is becoming increasingly important in the field of financial computing, as the complexity of financial models continues to increase. Many of these financial models do not have a practical close form solution in which case numerical methods are the only alternative. Monte-Carlo simulation is one of most commonly used numerical methods, in scientific computing in genera...

متن کامل

A High Performance FPGA-Based Accelerator for BLAS Library Implementation

This paper describes the implementation and the performance analysis of a hardware accelerator for the BLAS library matrix multiplication operation. This accelerator is based on a dual-FPGA board and on an implementation BLAS software library making use of the FPGA-based hardware. In order to evaluate the performance of such a system, we implemented the matrix multiplication operation (BLAS “dg...

متن کامل

Optimized Implementation of RNS FIR Filters Based on FPGAs

In this paper optimized Residue Number System (RNS) arithmetic blocks to better exploit some of the architectural characteristics of the last generation FPGAs are presented. The implementation of modulo m adders, modulo m constant and general multipliers, input and output converters are presented. These architectures are based on moduli sets chosen in order to optimally use the 6-input Look-Up ...

متن کامل

High Performance Computing with FPGAs

Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Emerging Topics in Computing

سال: 2022

ISSN: ['2168-6750', '2376-4562']

DOI: https://doi.org/10.1109/tetc.2022.3228575